44 research outputs found
Adaptive Feature Selection for Object Tracking with Particle Filter
International audienceObject tracking is an important topic in the field of computer vision. Commonly used color-based trackers are based on a fixed set of color features such as RGB or HSV and, as a result, fail to adapt to changing illumination conditions and background clutter. These drawbacks can be overcome to an extent by using an adaptive framework which selects for each frame of a sequence the features that best discriminate the object from the background. In this paper, we use such an adaptive feature selection method embedded into a particle filter mechanism and show that our tracking method is robust to lighting changes and background distractions. Different experiments also show that the proposed method outperform other approaches
ImageNet Large Scale Visual Recognition Challenge
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in
object category classification and detection on hundreds of object categories
and millions of images. The challenge has been run annually from 2010 to
present, attracting participation from more than fifty institutions.
This paper describes the creation of this benchmark dataset and the advances
in object recognition that have been possible as a result. We discuss the
challenges of collecting large-scale ground truth annotation, highlight key
breakthroughs in categorical object recognition, provide a detailed analysis of
the current state of the field of large-scale image classification and object
detection, and compare the state-of-the-art computer vision accuracy with human
accuracy. We conclude with lessons learned in the five years of the challenge,
and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL
VOC (per-category comparisons in Table 3, distribution of localization
difficulty in Fig 16), a list of queries used for obtaining object detection
images (Appendix C), and some additional reference
Evaluating Multimedia Features and Fusion for Example-Based Event Detection
Multimedia event detection (MED) is a challenging problem because of the heterogeneous content and variable quality found in large collections of Internet videos. To study the value of multimedia features and fusion for representing and learning events from a set of example video clips, we created SESAME, a system for video SEarch with Speed and Accuracy for Multimedia Events. SESAME includes multiple bag-of-words event classifiers based on single data types: low-level visual, motion, and audio features; high-level semantic visual concepts; and automatic speech recognition. Event detection performance was evaluated for each event classifier. The performance of low-level visual and motion features was improved by the use of difference coding. The accuracy of the visual concepts was nearly as strong as that of the low-level visual features. Experiments with a number of fusion methods for combining the event detection scores from these classifiers revealed that simple fusion methods, such as arithmetic mean, perform as well as or better than other, more complex fusion methods. SESAME’s performance in the 2012 TRECVID MED evaluation was one of the best reported
Fusion Techniques in Biomedical Information Retrieval
For difficult cases clinicians usually use their experience and also the information found in textbooks to determine a diagnosis. Computer tools can help them supply the relevant information now that much medical knowledge is available in digital form. A biomedical search system such as developed in the Khresmoi project (that this chapter partially reuses) has the goal to fulfil information needs of physicians. This chapter concentrates on information needs for medical cases that contain a large variety of data, from free text, structured data to images. Fusion techniques will be compared to combine the various information sources to supply cases similar to an example case given. This can supply physicians with answers to problems similar to the one they are analyzing and can help in diagnosis and treatment planning
Finding Semantically Related Videos in Closed Collections
Modern newsroom tools offer advanced functionality for automatic and semi-automatic content collection from the web and social media sources to accompany news stories. However, the content collected in this way often tends to be unstructured and may include irrelevant items. An important step in the verification process is to organize this content, both with respect to what it shows, and with respect to its origin. This chapter presents our efforts in this direction, which resulted in two components. One aims to detect semantic concepts in video shots, to help annotation and organization of content collections. We implement a system based on deep learning, featuring a number of advances and adaptations of existing algorithms to increase performance for the task. The other component aims to detect logos in videos in order to identify their provenance. We present our progress from a keypoint-based detection system to a system based on deep learning